CiteSeerX: Intelligent Information Extraction and Knowledge Creation from Web-Based Data

نویسندگان

Alexander G. Ororbia

Jian Wu

چکیده

In order to provide convenient access to this web-based data, intelligent systems, such as CiteSeerX, are developed to construct a knowledge base from this unstructured information. CiteSeerX does this autononmously, even leveraging utility-based feedback control to minimize computational resource usage and incorporate user input to correct automatically extracted metadata [26]. The rich metadata that CiteSeerX extracts has been used for many data mining projects. CiteSeerX provides free access to over 4 million full-text academic documents and rarely seen fuctionalities, e.g., table search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Scholarly Data in CiteSeerX: Information Extraction from the Web

We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing largescale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system in order to investigate and explore ongoing and future research de...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Bootstrapping an Ontology-based Information Extraction System

Automatic intelligent web exploration will benefit from shallow information extraction techniques if the latter can be brought to work within many different domains. The major bottleneck for this, however, lies in the so far difficult and expensive modeling of lexical knowledge, extraction rules, and an ontology that together define the information extraction system. In this paper we present a ...

متن کامل

Intelligent Health Solution System

Introduction: In the field of management, the statistics and performance of the deputies and functions of the organization are always of great importance, which requires instant access to the latest status of the system under coverage and minimal forecast of the future situation, to provide quality services Also improve. All of this justifies the existence of an intelligent statistical system w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

CiteSeerX: Intelligent Information Extraction and Knowledge Creation from Web-Based Data

نویسندگان

چکیده

منابع مشابه

Big Scholarly Data in CiteSeerX: Information Extraction from the Web

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Data Extraction using Content-Based Handles

Bootstrapping an Ontology-based Information Extraction System

Intelligent Health Solution System

عنوان ژورنال:

اشتراک گذاری